Representation schemes for language data: the Text Encoding Initiative and its potential impact for encoding African languages

نویسنده

  • Nancy Ide
چکیده

The Text Encoding Initiative (TEI)Guidelines for the Encoding and Interchange of Machine-Readable Texts provide standardized encoding conventions for a large range of text types and features relevant for a broad range of applications. Given the potential challenges of encoding texts in the African languages, it will be important to establish collaboration between the TEI and projects encoding language resources in these languages. Résumé. Les Recommandations pour l'Encodage et l'Echange des Textes Informatisés de la Text Encoding Initiative (TEI) fournissent des conventions d'encodage pour un large éventail de types de textes et d'éléments textuels utiles pour de nombreuses applications. L'encodage des textes africains pose des difficultés particulières, et il est important d'établir des collaborations entre la TEI et les projets qui encodent des ressources linguistiques et textuelles pour les langues africaines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building a Knowledge Base of Morphosyntactic Terminology

This paper describes the beginning of an effort within the Linguist List’s Electronic Metastructure for Endangered Languages Data (E-MELD) project to develop markup recommendations for representing the morphosyntactic structures of the world’s endangered languages. Rather than proposing specific markup recommendations as in the Text Encoding Initiative (TEI), we propose to construct an environm...

متن کامل

Data in Your Language : The ECI

In this paper we describe the contents and the method of production of the ACL European Corpus Initiative Multilingual Corpus 1 (ECI/MC1). This is a large multilingual electronic text corpus, containing 97 million words in 27 (mainly European) languages. It is available cheaply on CDROM. Most of the texts in the corpus are marked up using a fully-validated SGML document type description based o...

متن کامل

Identity and Representation through Language in Ghana: The Postcolonial Self and the Other

Research related to colonialism and post colonialism shows how the identities of indigenous people were constructed and how these identities are reconstructed in our contemporary world. The thrust of this paper is that colonialism brought a shift in the linguistic structure of Ghana with the introduction of the use of English among Ghanaians. The coexistence of both Ghanaian languages and Engli...

متن کامل

Encoding standards for large text resources: The Text Encoding Initiative

The Text Encoding Initiative (TEl) is an international project established in 1988 to develop guidelines for the preparation and interchange of electronic texts for research, and to satisfy a broad range of uses by the language industries more generally. The need for standardized encoding practices has become inxreasingly critical as the need to use and, most importantly, reuse vast amounts of ...

متن کامل

The MULTEXT-East Morphosyntactic Specification for Slavic Languages

Word-level morphosyntactic descriptions, such as “Ncmsn” designating a common masculine singular noun in the nominative, have been developed for all Slavic languages, yet there have been few attempts to arrive at a proposal that would be harmonised across the languages. Standardisation adds to the interchange potential of the resources, making it easier to develop multilingual applications or t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008